Robust Nonparametric Data Approximation of Point Sets via Data Reduction

نویسندگان

  • Stephane Durocher
  • Alexandre Leblanc
  • Jason Morrison
  • Matthew Skala
چکیده

In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence P of n points in the plane that determine a simple polygonal chain consisting of n−1 segments, we describe algorithms for selecting an ordered subset Q ⊂ P (including the first and last points of P ) that determines a second polygonal chain to approximate P , such that the number of crossings between the two polygonal chains is maximized, and the cardinality of Q is minimized among all such maximizing subsets of P . Our algorithms have respective running times O(n logn) when P is monotonic and O(n log n) when P is an arbitrary simple polyline. Finally, we examine the application of our algorithms iteratively in a bootstrapping technique to define a smooth robust non-parametric approximation of the original sequence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Determination of Neck-Pore Size Distribution of Porous Membranes via Bubble Point Data

Reliable estimation of the porous membranes neck-pore size distribution (NPSD) is the key element in the design and operation of all membrane separation processes. In this paper, a new approach is presented for reliable of NPSD of porous membranes using wet flow-state bubble point test data. For this purpose, a robust method based on the linear regularization theory is developed to extract NPSD...

متن کامل

Asymptotic Behaviors of the Lorenz Curve for Left Truncated and Dependent Data

The purpose of this paper is to provide some asymptotic results for nonparametric estimator of the Lorenz curve and Lorenz process for the case in which data are assumed to be strong mixing subject to random left truncation. First, we show that nonparametric estimator of the Lorenz curve is uniformly strongly consistent for the associated Lorenz curve. Also, a strong Gaussian approximation for ...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Leveraging the Power of Big Data for Robust Process Operations under Uncertainty

We propose a data-driven outlier-insensitive adaptive robust optimization framework that leverages big data in industries. A Bayesian nonparametric model – the Dirichlet process mixture model – is adopted to extract the information embedded within uncertainty data via a variational inference algorithm. We then devise data-driven uncertainty sets for adaptive robust optimization. This Bayesian n...

متن کامل

Stochastic Gradient Descent Methods for Estimation with Large Data Sets

We develop methods for parameter estimation in settings with large-scale data sets, where traditional methods are no longer tenable. Our methods rely on stochastic approximations, which are computationally efficient as they maintain one iterate as a parameter estimate, and successively update that iterate based on a single data point. When the update is based on a noisy gradient, the stochastic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012